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Abstract: 

A simplified interaction potential for protein folding studies at the atomic level is 
discussed and tested on a set of peptides with about 20 residues each. The test set 
contains both a-helical (Trp cage, F s ) and /3-sheet (GBlp, GBlm2, GBlm3, Be- 
tanova, LLM) peptides. The model, which is entirely sequence-based, is able to fold 
these different peptides for one and the same choice of model parameters. Further- 
more, the melting behavior of the peptides is in good quantitative agreement with 
experimental data. Apparent folded populations obtained using different observables 
are compared, and are found to be very different for some of the peptides (e.g., Be- 
tanova). In other cases (in particular, GBlm2 and GBlm3), the different estimates 
agree reasonably well, indicating a more two-state-like melting behavior. 
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1 Introduction 



The function of peptides and proteins is inextricably connected to their folding be- 
havior, as is underlined by the facts that many neuro- degenerative disorders are being 
linked to misfolding and aggregation [1], and that coupled folding and binding seems 
to be a more common phenomenon than previously thought [2]. It is therefore an 
important development that folding simulations at the atomic level are now becoming 
feasible for short polypeptide chains [3], thanks to faster computers, more efficient 
algorithms and improved force fields. 

There are, however, questions about the interaction potentials used in the simulations 
that need further investigation. One difficulty is that different potentials give very 
different relative weights to the a-helix and /3-strand regions of the Ramachandran 
space [4]. A potential that successfully folds an a-helical peptide might therefore 
have problems with /9-sheet peptides, and vice versa. Another difficulty is with the 
temperature dependence of observable quantities. As pointed out by Zhou et al. [5], 
it seems that most current models need further calibration in order to give a temper- 
ature dependence that is not too weak; as a result, calculated melting temperatures 
are often unrealistically high. A systematic study of these thermodynamic questions 
requires extensive conformational sampling and is a challenge, especially in models 
with explicit water. 

Here we study a model that contains all atoms of the polypeptide chains but no 
explicit solvent molecules. Formally, such a model is obtained by integrating out 
the solvent degrees of freedom. Finding an accurate and computationally tractable 
approximation of the resulting effective potential is, however, a highly non-trivial 
problem. Examples of implicit solvent models that have been used in folding studies 
with some success, include the generalized Born approach [6], the method based on 
screened Coulomb potentials by Hassan et al. [7], and the method based on solvent 
accessible surface areas by Ferrara et al. [8]. In this paper, we study a minimalistic 
model in which the effects of the solvent are represented by an effective attraction 
between nonpolar side chains. Our study focuses on the thermodynamic behavior 
of this model, which we investigate using efficient Monte Carlo methods rather than 
molecular dynamics. This choice is made for computational convenience; with some 
minor modifications, it would be possible to study the same model using molecu- 
lar dynamics. Promising computational techniques have recently been proposed by 
Hansmann and Wille [9] and Schug et al. [10], but these methods are for energy 
minimization, which is insufficient for our purposes. 

In addition to effective hydrophobic attraction, the interaction potential of our model 
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contains two major terms, representing excluded-volume effects and hydrogen bond- 
ing. The potential is deliberately kept simple, partly for the sake of clarity but also 
for practical reasons; any potential requires careful calibration, and this task is easier 
with a simple potential like ours with fewer parameters to tune. In the future, the 
potential may be further developed with the inclusion of new terms such as Coulomb 
interactions between side-chain charges, but not before it becomes clear that they are 
needed. The different terms of the potential represent either the interaction between 
two individual atoms (excluded volume), or two pairs of atoms (e.g., hydrogen bonds), 
or an effective interaction between a pair of side chains (hydrophobicity) . The largest 
units playing a role in the potential are the amino acids, and no information about 
the sequence as a whole or its native structure is used in the potential. 

Our approach towards the problem of determining the interaction potential is phe- 
nomenological. The shape of individual terms is inspired by intuitive notions rather 
than being rigorously derived from a microscopic picture. Their exact functional 
forms and relative sizes are constrained by the effectiveness of the model in describ- 
ing the folding behavior of more and more sequences. When such a potential evolves 
to a point where it can successfully fold a significant number of peptides of different 
native geometries, and capture the thermodynamic behavior of all those peptides, it 
would be useful on its own as a working potential for thermodynamic studies of new 
sequences, and also provide hints about the relative importance of different physical 
effects in protein folding. 

We have previously shown that earlier versions of this model are able to fold both 
a-helix and /9-sheet peptides [11,12]. In this paper we present a further development 
of this model. We test the new model on the following set of peptides (see Fig. 1): 
the a-helical Trp cage [14] and F s [15,16], and the /3-sheet peptides GBlp [17,18], 
GBlm2 and GBlm3 [19], Betanova [20] and LLM [21]. Here GBlp denotes the 
C-terminal /3-hairpin from the protein G Bl domain, while Betanova is a designed 
three-stranded /5-sheet peptide. GBlm2 and GBlm3 are mutants of GBlp, while 
LLM is a mutant of Betanova, with enhanced stabilities. We find that our model 
provides a good description of the thermodynamic behavior of all these peptides. The 
same model was furthermore used in a recent study of the oligomerization properties 
of the amyloid A/9i 6 _ 2 2 peptide [22], with very promising results. 
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Figure 1: Schematic illustration of the different geometries of the peptides studied. 
Shown from left to right are the reference structures (see below) used for the Trp 
cage, F s , GBlm3 and Betanova. Drawn with RasMol [13]. 

2 Model and Methods 



2.1 Model 



Our model contains all atoms of the polypeptide chains, including hydrogen atoms. 
The model assumes fixed bond lengths, bond angles and peptide torsion angles (180°), 
so that each amino acid only has the Ramachandran torsion angles 0, ip and a 
number of side-chain torsion angles as its degrees of freedom. Numerical values of 
the geometrical parameters held constant can be found elsewhere [11]. 

In the simulations we internally use a dimensionless energy scale. The correspondence 
(constant factor) of this scale to the physical energy scale is determined by using 
the model prediction of the dimensionless energy value for an observable and the 
experimental value for the same. We use the melting temperature T m = 315 K of 
the Trp cage [14] for this purpose (see below), which is found to correspond to a 
dimensionless energy kT m of 0.470 in the model (k is Boltzmann's constant). Energy 
parameters of the model (such as the n ev , K\ oc , ej^, etc. below) are given in our 
internal energy scale. It must be emphasized that this energy scale is left unchanged 
when analyzing the other peptides. 

The interaction potential 

E = E ev + E\ oc + E hh + E hp (1) 
is composed of four terms. The first term in Eq. [TJ E ev , represents excluded- volume 
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effects and has the form 



B,=*,E[ A * (g '. +g ' ) r- (2) 



i<j 



r 'J 



where the summation is over pairs of atoms n ew = 0.10, and <jj = 1.77, 1.75, 

1.55, 1.42 and 1.00 A for S, C, N, O and H atoms, respectively. The values of the radii 
<7j agree reasonably well with the statistical analysis of Tsai et al. [23]. The <7j values 
for C, N and O strongly influence the shape of the Ramachandran 0, if> distribution, 
and must therefore be carefully chosen. The parameter Ajj in Eq. El has the value 
0.75 for all pairs except those connected by three covalent bonds, for which A^- = 1. 
The reason why we use a reduction factor A^ < 1 for all non-local pairs is both 
computational efficiency and the restricted flexibility of a chain with only torsional 
degrees of freedom, which could create artificial traps. To speed up the calculations, 
Eq. |21 is evaluated using a cutoff of r?- = 4.3Ay A, and pairs with fixed separation are 
omitted. 



The second energy term, E\ oc , has the form 

£l°c = «locI] ^Jlj > ( 3 ) 

where the inner sum represents the interactions between the partial charges of the 
backbone NH and CO groups in one amino acid, I. This potential is not used for 
Pro which lacks the NH group, or Gly which tends to be more exposed to water than 
other amino acids, due to the missing side chain. Neither is it used for the two end 
amino acids, unless these are protected by capping groups. The inner sum in Eq. |3] 
has four terms (NO, NC, HC and HO) which depend only on the <p an d ip angles 
for amino acid /. The partial charges are taken as q,i = ±0.20 for H and N and 
qi = ±0.42 for C and O [24], and we put /ti oc = 100, corresponding to a dielectric 
constant of e r ~ 2.5. 



The third term of the energy function is the hydrogen-bond energy E^, which has 
the form 

^hb = egj u ( r ij)v(aij, fa) + eg} ^ , ( 4 ) 

bb— bb sc— bb 

where the two functions u(r) and v(a,j3) are given by 

, „s f (cos a cos 3) 1/2 ifa,/5>90° 

= otherwise (6) 
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We consider only hydrogen bonds between NH and CO groups, and denotes the 
HO distance, the NHO angle and the HOC angle. The parameters e^, 
and <7hb are taken as 3.1, 2.0 and 2.0 A, respectively. The function u(r) is calculated 
using a cutoff of r c = 4.5 A. The first sum in Eq. 0] contains backbone-backbone 
interactions, while the second sum contains interactions between charged side chains 
(Asp, Glu, Lys and Arg) and the backbone. The latter type of interaction is taken 
to be effectively weak (e^, < e^), because there are competing interactions between 
the side-chain charges and the surrounding water that are omitted in the model. For 
the same reason, we do not include any term in corresponding to side chain- 

(2) 

side chain interactions. It is possible that the effective strength e^ b should be made 
stronger in case the side-chain charge gets shielded from the water. This context 
dependence is ignored in the model, which should be a reasonable approximation 
for small peptides. Hydrogen bonds between parts that are very close in sequence 
are rare in protein structures and therefore disregarded in the model; specifically, we 
disallow backbone NH (CO) groups to make hydrogen bonds with the two nearest 
backbone CO (NH) groups on each side of them, and we also forbid hydrogen bonds 
between the side chain of one amino acid with the nearest donor or acceptor on 
either side of its C a . As a simple form of context dependence, we assign a reduced 
strength to hydrogen bonds involving chain ends, which tend to be exposed to water. 
A hydrogen bond involving one or two end groups is reduced in strength by factors 
of 2 and 4, respectively. If there are capping groups, these groups are taken to be the 
end groups; otherwise, the two end amino acids take this role. 



The fourth energy term, E^ p , represents an effective hydrophobic attraction between 
nonpolar side chains. It has the pair-wise additive form 

E hp = -J2MuCu, (7) 

KJ 

where Cu is a measure of the degree of contact between side chains / and J, and 
Mjj sets the energy that a pair in full contact gets. The matrix Mjj is defined in 
Table 1. To calculate Cu we use a predetermined set of atoms, Aj, for each side 
chain I. We define Cjj as 

c -< = [ £ %»; 4) + £ %r«) ] • (8) 

ieAi ' jeAj 

where the function f(x) is given by f{x) = 1 if x < A, f(x) = if x > B, and 
f(x) = (B - x)/(B - A) if A < x < B [A = (3.5 A) 2 and B = (4.5 A) 2 ]. Roughly 
speaking, Cu is the fraction of atoms in Aj or Aj that are in contact with some 
atom from the other side chain. For Pro, the set Aj consists of the C^, C 7 and C$ 
atoms. The definition of Aj for the other hydrophobic side chains has been given 



6 



I II III 



I Ala 

II He, Leu, Met, Pro, Val 
III Phe, Trp, Tyr 



0.0 0.1 0.1 
0.9 2.8 
3.2 



Table 1: The hydrophobicity matrix Mjj. Hydrophobic amino acids are divided into 
three categories. The matrix Mjj represents the size of hydrophobicity interaction 
when an amino acid of type / is in contact with an amino acid of type J. 



elsewhere [11]. We expect the gain in forming a hydrophobic contact to be smaller if 
the two side chains are close in sequence, because such a pair is partly protected by 
the backbone. Therefore, we reduce the strength of the hydrophobic attraction for 
pairs that are nearest or next-nearest neighbors along the sequence; Mjj is reduced 
by a factor of 2 for next-nearest neighbors, and taken to be for nearest neighbors. 

The parameters of this potential were essentially determined by a somewhat tedious 
trial and error procedure, involving parallel simulations of the different peptides. 
The target was to have native-like free-energy minima for all the peptides at low 
temperature, whereas the temperature dependence was not considered at all. It is 
interesting to note that this criterion alone was sufficiently discriminating to yield 
parameter values that appear physically reasonable, as well as a realistic temperature 
dependence (see below). Some parameters, such as ej^, strongly influence the folding 

(2) 

properties of the model, and are therefore well determined. Others, such as ^jj-j , are 
less important and, as a result of this, quite poorly determined. 

The new version of the model differs from earlier versions in the precise form of 
the simple context dependence of E\ oc and E hh . Also, the reduction factor for the 
hydrophobic attraction between next-nearest neighbors along the chain has been 
changed. Furthermore, we have added Pro, which does not occur in any of our previ- 
ously studied sequences, to the list of hydrophobic amino acids. All other parameters 
of the potential are the same as in the last version of the model, except for a slight 
reduction in strength of the local potential (/ti oc )- 

It should be stressed that this potential is not expected to provide a good description 
of general amino acid sequences. For example, it is likely that the pair-wise additive 
hydrophobicity potential is inadequate for long chains, due to double-counting effects. 
For long chains, anti-cooperative multibody effects might play a significant role [25]. 
By extending the present calculations in the future to new and longer sequences, we 
hope that it will be possible to refine the potential and thereby make it more general. 
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2.2 Computational methods 



To study the thermodynamic behavior of this model, we use simulated temper- 
ing [26-28]. in which the temperature is a dynamical variable. For a review of sim- 
ulated tempering and other generalized-ensemble techniques for protein folding, see 
Hansmann and Okamoto [29]. We study eight different temperatures T^, which range 
from T min = 275 K to T max = 369 K and are given by T k = T min (T max /T min )( fe_1 ^ 7 
(k = 1,...,8). The average acceptance rate for the temperature jumps is about 
70%. 



Our simulations are carried out using two different elementary moves for the backbone 
degrees of freedom: first, the highly non-local pivot move in which a single backbone 
torsion angle is turned; and second, a semi-local method [30] that works with up to 
eight adjacent backbone degrees of freedom, which are turned in a coordinated man- 
ner. Side-chain angles are updated one by one. Every update involves a Metropolis 
accept /reject step, thus ensuring detailed balance. All our simulations are started 
from random configurations. All statistical errors quoted are la errors obtained from 
the variation between independent runs. For each peptide, we performed about 10 
independent runs. Each run contained 10 9 elementary Monte Carlo steps (1.5 • 10 9 
steps for GBlp) and required 1-2 CPU days on a 1.6 GHz computer. 

To characterize the folding behavior of the different peptides, we monitor several 
quantities. For a peptide with N amino acids, we define the a-helix content H as the 
fraction of the N — 2 inner amino acids with their Ramachandran (0, ip) pair in the 
region —90° < <fi < —30°, —77° < ip < —17°. We calculate the radius of gyration, 
R g , over the backbone atoms, with unit mass for all atoms. We also study root- 
mean-square deviations (RMSD) from folded reference structures, calculated over 
either the backbone atoms or all heavy atoms. A backbone RMSD is denoted by A b 
and a heavy-atom RMSD by A. For the /3-sheet peptides, there exist topologically 
distinct states that the backbone RMSD cannot discriminate between, which makes 
it necessary to use the heavy-atom RMSD. 



In our analysis of the results from the simulations, it turns out that the temperature 
dependence of a quantity X in many cases can be well described by the simple two- 
state expression 

X(T) = X ^ X ^ T ) (9) 
{ ' 1 + K(T) ■ [ ) 

Our fits to this equation are carried out by using a Levenberg-Marquardt proce- 
dure [31]. Throughout the paper, the baselines X u and X n are taken to be temper- 
ature independent, whereas the effective equilibrium constant K(T) is assumed to 
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have the first-order form K(T) = exp[(l//cT — l/kT m )AE], where T m is the midpoint 
temperature and AE = E n — E n is the energy difference between the unfolded and 
native states. With these assumptions, a fit to Eq. El has four parameters: AE, T m , 
X,, and X n . 



3 Results and Discussion 

Using the model and methods described in the previous section, we performed high- 
statistics thermodynamic simulations of the peptides mentioned in the introduction, 
namely the Trp cage, F s , GBlp, GBlm2, GBlm3, Betanova and LLM. In this section 
we present the results of these calculations. 



3.1 Trp cage 

The optimized 20-residue Trp cage (NLYIQWLKDGGPSSGRPPPS) is a "minipro- 
tein" with a compact folded state and a melting temperature of 315 K, as determined 
by circular dichroism (CD) and NMR measurements [14]. The NMR-derived na- 
tive structure [14] contains a short a-helix (residues 2-8), a single turn of 3io-helix 
(residues 11-14), and a hydrophobic core consisting of three proline residues (Prol2, 
Prol8, Prol9) and two aromatic residues (Tyr3, Trp6). The folding time is a few /is 
at room temperature [32] . Its small size, fast folding and relative stability makes the 
Trp cage an ideal testbed for computational methods, and folding simulations of this 
peptide were reported by several groups [10,33-36]. Two of these groups performed 
thermodynamic studies [35,36]. Both groups made detailed comparisons with raw 
NMR data with very good results, but the calculated melting temperatures were too 
high (> 400 K). 

In our model the melting temperature of the Trp cage is, by definition, equal to its 
experimental value, since we use this quantity to set the energy scale of the model. For 
this purpose, we consider the helix content H, as defined in the previous section, which 
should be strongly correlated with the CD signal studied experimentally. Fig. 2a 
shows our results for H against temperature. A fit to the data with the two-state 
expression in Eq. is also shown. As can be seen in the figure, the two-state fit 
provides an excellent description of the data. The midpoint temperature from this 
fit, T m , is set to 315 K, the experimental melting temperature. Having done that, 
there is no free parameter left in the model. The fitted value of the parameter 
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(b) 



275 300 325 350 375 2 4 6 8 10 

T (K) A b (A) 

Figure 2: The Trp cage, (a) Helix content against temperature. The line is a fit to 
Eq. E3(T m = 315 K, AE = 11.5±0.2kcal/mol). Statistical errors are smaller than the 
plot symbols, (b) Contour plot of the free energy F(A\,,E) at 275 K. The contours 
are spaced at intervals of 1 kT. Contours more than 6 kT above the minimum free 
energy are not shown. The free energy F(A^, E) is defined by exp[— F(A\>, E)/kT] oc 
P(Ab, E), where P(Ab, E) denotes the joint probability distribution of Ab and E at 
temperature T. 




275 300 325 350 375 

T(K) 

Figure 3: Native population against temperature for the Trp cage. The line is the 
result obtained from the model, through the fit shown in Fig. EK- Plot symbols show 
experimental results [14] based on CD (o) and NMR (•), respectively. 

AE = 11.5 ± 0.2kcal/mol is, in contrast to that of T m , not used for calibration, but 
is rather a prediction of the model. 

In the two-state picture (Eq. EJ), the native population at temperature T is given by 
l/{l+exp[— (1/kT— l/kT m )AE]}. Fig. 3 shows the native population obtained using 
the above mentioned AE and T m , against temperature, along with experimental 
values based on CD and NMR [14]. We see that the results obtained from the model 
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are in good agreement with the experimental data over the entire temperature range, 
with a maximum deviation of ~ 5 % at the lowest temperatures. With the overall 
energy scale properly determined, we thus find that the melting behavior of this 
peptide is well described by the model. 

At low temperature, we find a helix content similar to that of the NMR structure, 
~30% (see Fig. 2a). An RMSD analysis confirms that the typical low-temperature 
structure is similar to the NMR structure (PDB code 1L2Y, first model), as illustrated 
in Fig. 2b. This figure shows the free energy F(A b , E) calculated as a function of 
the backbone RMSD A b (residues 2-19) and the energy E, at 275 K. We see that 
F(Ab, E) has a simple shape with one dominating minimum, which is located at 
A b « 2.3 A. 



3.2 F s 

The designed 21-residue F s peptide is given by Suc-A 5 (AAARA) 3 A-NH 2 , (where Sue 
is succinylic acid) and makes an a-helix [15,16]. Other N-capping groups than Sue 
have also been used in the experiments on this peptide. The melting behavior of 
F s was studied using CD as well as infrared (IR) spectroscopy. The melting tem- 
perature measured by IR was 334 K [37], whereas the CD-based studies obtained 
T m = 308 K [16] and T m = 303 K [38]. Computational studies of F s have also been 
reported [39-41]. By explicit water simulations, Garcia and Sanbonmatsu [40] ob- 
tained a T m of 345 K, which is in reasonable agreement with the IR-based value. 
Using an earlier version of our model and ignoring the capping groups, a T m of 310 K 
was obtained [11]. In the present calculations, we include the Sue and NH 2 groups. 

Fig. 4a shows the helix content versus temperature as obtained from our F s calcula- 
tions. A two-state fit of the data gives T m = 304 ± 1 K, which is significantly lower 
than the IR-based result mentioned above but in perfect agreement with the CD 
studies, especially that of Thompson et al. [38]. For the energy difference, we obtain 
AE = 11.9 ± 0.3kcal/mol, which also agrees with what Thompson et al. found, 
namely AE = 12 ± 2kcal/mol. It may be worth noting that the experimental data 
that we compared with in the Trp cage case were based on CD rather than IR. 

In Fig. 4b we show the free energy F(A b , E) at 275 K. In the absence of a precise 
experimental structure for F s , we define A b as the (backbone) RMSD from an ideal 
a-helix (all residues). From the figure we see that the free energy has its global 
minimum at A b 0.5 A, which indeed corresponds to the a-helix. There are also 
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T (K) A b (A) 



Figure 4: Same as Fig. H for the F s peptide (T m = 304 ± IK, AE — 11.9 ± 
0.3 kcal/mol). 

two local minima at Ab ~ 7 A and Ab ~ 11 A, both of which correspond to /3-sheet 
structures. These two minima are very weakly populated compared to the a-helix 
minimum. 



3.3 GBlp and GBlm2/GBlm3 

Using exactly the same model, we now turn to /3-sheet peptides. That GBlp (GEW- 
TYDDATKTFTVTE), the 41-56-residue fragment from the protein G Bl domain, 
makes a /3-hairpin on its own was a breakthrough discovery [18] that has been followed 
by numerous atomic simulations of this particular sequence [5,42-50]. Recently, two 
mutants of GBlp with enhanced stability were designed [19], GBlm2 and GBlm3, 
by replacing the turn segment DDATKT by NPATGK. The mutant GBlm2 (GEW- 
TYNPATGKFTVTE) is identical to GBlp except for this change, while GBlm3 
(KKWTYNPATGKFTVQE) differs from GBlp at the chain ends as well. By CD 
and NMR, GBlm3 was estimated to be 86 ± 3% folded at 298 K and to have a T m 
of 333 ± 2 K, whereas GBlm2 was found to have a slightly lower folded population, 
74 ±5% at 298 K, and a T m of 320 ±2 K [19]. In the same study, GBlp was estimated 
to be ~ 30 % folded at 298 K. An earlier NMR study found GBlp to be 42 % folded 
at 278 K [18]. Both these estimates of native population for GBlp are low compared 
to the result of a Trp fluorescence study [51]; a two-state analysis of these data gave 
T m = 297K and AE = 11.6kcal/mol [51]. 

It turns out that our model fails to reproduce the experimental difference in stability 
between GBlm2 and GBlm3. In fact, GBlm2 and GBlm3 show nearly identical 
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275 300 325 350 375 

T(K) 

Figure 5: The hydrophobicity energy Eh p against temperature for GBlp (o) and 
GBlm3 (•). The lines are fits to Eq. El (T m = 297 ± 1 K, AE = 14.2 ± 0.2kcal/mol 
for GBlp; T m = 321 ± 1 K, AE — 15.0 ± 0.4kcal/mol for GBlm3). The points 
corresponding to the two highest temperatures were omitted for GBlp, as removing 
them resulted in a significantly better fit in terms of x 2 P er degree of freedom. 



behavior in our model. For clarity, we therefore show results only for one of these 
peptides, GBlm3, in the figures below. 

Fig. 5 shows the hydrophobicity energy E^ p against temperature for GBlp and 
GBlm3 in the model. We expect E^ p to be strongly correlated with Trp fluores- 
cence for these peptides, as Trp43 forms a hydrophobic cluster together with Tyr45, 
Phe52 and Val54. A two-state fit to our data for GBlp gives T m = 297 ± 1 K and 
AE = 14.2 ± 0.2kcal/mol, which indeed is in good agreement with the Trp fluores- 
cence results for this peptide (T m = 297 K, AE = 11.6 kcal/mol). The same type of fit 
gives T m = 321±1K and AE = 15.0 ± 0.4 kcal/mol for GBlm3, and T m = 322 ±2K 
and AE = 15.1 ± 0.4 kcal/mol for GBlm2. These two very similar T m estimates lie 
close to the experimental result for GBlm2 (320 ± 2K) and somewhat below that 
for GBlm3 (333 ± 2K). Our E^ p data indicate that GBlm2 and GBlm3 indeed are 
markedly more stable than GBlp in the model, which is confirmed by the results 
discussed next. 

Fig. 6a shows our data for the free energy F(A, E) for GBlp, at 275 K. On its own the 
GBlp fragment is believed to adopt a folded structure similar to that it has as part 
of the native protein G Bl domain, although the NMR restraints were insufficient 
to determine a unique structure for the excised fragment. As reference structure 
in the calculation of A, we therefore use the corresponding fragment of the NMR 
structure for the full protein G Bl domain (PDB code 1GB1, residues 41-56, first 
model) [52]. The heavy-atom RMSD A is used instead of the backbone RMSD A b , 
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because A b cannot distinguish between the two possible /3-hairpin topologies (with 
similar backbone folds but oppositely oriented side chains). We find that the two 
lowest minima of F(A,E), at A m 2.0 A and A « 3.2 A, both correspond to a (3- 
hairpin with the same topology and the same set of backbone hydrogen bonds as the 
reference structure. The main difference between these two minima lies in the shape 
of the turn region. In addition to these minima, there are two weakly populated 
local minima at A « 5.3 A and A « 8-10 A, which correspond to a /3-hairpin with 
the opposite topology and a-helix, respectively. The shape of F(A, E) for GBlp was 
also studied using earlier versions of our model [11,12]. The present model yields 
very similar results, with a minor enhancement of the two native-like minima at the 
expense of the two other local minima mentioned above. 

Fig. 6b shows the corresponding free-energy plot for GBlm3. As reference structure 
for GBlm3, we use a mutated and relaxed version of the GBlp reference structure. 
We see that F(A, E) has a simpler shape for GBlm3 than for GBlp. There is only 
one detectable free-energy minimum for GBlm3, and this minimum corresponds to 
a structure similar to the favored one for GBlp. 

Different experiments on GBlp have, as mentioned above, obtained different (3- 
hairpin populations. One way of estimating folded populations in the model is by 
two-state fits like those in Fig. 5. An independent and more direct estimate can be 
obtained by counting native backbone hydrogen bonds. To this end, we consider a 
hydrogen bond formed if its energy is less than — ej^/3. The number of native back- 
bone hydrogen bonds in a given conformation is denoted by N^. Fig. 7 shows the 
probability distribution of for GBlp and GBlm3 at 299 K, which is very close to 
the temperature (298 K) at which the folded populations of these two peptides were 
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Figure 7: Probability distribution of the number of native hydrogen bonds, N^, 
for GBlm3 (full line) and GBlp (dotted line) at 299 K. The hydrogen bonds taken 
as native are the same for both peptides. In GBlp notation, the native hydrogen 
bonds are Glu42(N)-Thr55(0), Glu42(0)-Thr55(N), Thr44(N)-Thr53(0), Thr44(0)- 
Thr53(N), Asp46(N)-Thr51(0), Asp46(0)-Thr51(N) and Asp47(O)-Lys50(N). 



compared by CD and NMR [19]. We find that the probability distribution P(A^*) 
has a clear bimodal shape for both peptides, with one native and one unfolded peak. 
The native peak is, as expected from the results above, significantly larger for the 
mutant GBlm3 than for GBlp. Taking conformations with > 3 as native and 
those with < 2 as unfolded, we obtain native populations of 82±1 % for GBlm3, 
84 ± 1 % for GBlm2, and 27 ± 2 % for GBlp. The overall agreement between these 
results and the experimental data (86 ± 3 % for GBlm3, 74 ± 5 % for GBlm2, ~ 30 % 
for GBlp) is very good, although the model slightly overestimates the folded fraction 
for GBlm2. Note that the native populations estimated from P^N^), thanks to the 
bimodality, are quite well determined, despite that the precise definition of native in 
terms of is somewhat arbitrary. 

For GBlm3, we find that one of the hydrogen bonds taken as native is very unlikely 
to form in our model, namely Pro47(O)-Gly50(N). As a result, conformations with 
N^ 1 = 7 are very rare (see Fig. 7). 

Our Eh p - and iV^f-based native populations for GBlp are different; from the E^ p data 
we obtain a native population of 46 % at 299 K, where the analysis gives 27 %. 
The magnitude of this difference is similar to that between different experiments. The 
iV^-based result is is good agreement with CD and NMR data, whereas the E^ p - 
based result agrees with Trp fluorescence data. For GBlm3 (and GBlm2), we do not 
know of any Trp fluorescence study. Our model suggests that the difference between 
different methods would be smaller in this case. Our P hp -based folded population at 
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Figure 8: (a) Probability distribution of the number of native backbone hydro- 
gen bonds, 7V£f, for LLM (full line) and Betanova (dotted line) at 287 K. (b) 
Frequencies of occurrence for the different native hydrogen bonds for Betanova 
(o) and LLM (•) at 287 K. In Betanova notation, the native hydrogen bonds are 
1: Ser4(N)-Thrll(0), 2: Ser4(0)-Thrll(N), 3: Gln6(N)-Lys9(0), 4: Gln6(0)- 
Lys9(N), 5: TyrlO(N)-Thrl7(0), 6: TyrlO(0)-Thrl7(N), 7: Asnl2(N)-Lysl5(0) and 
8: Asnl2(0)-Lysl5(N). 

299 K is 85% for GBlm3, which is close to our iV£ b at -based result of 82%. 



3.4 Betanova and LLM 

Betanova is a designed antiparallel three-stranded /5-sheet peptide with 20 residues 
(RGWSVQNGKYTNNGKTTEGR) [20], which is only marginally stable [21]. Re- 
cently, Betanova mutants with higher stability were developed [21], such as the triple 
mutant LLM (Val5Leu, Asnl2Leu, Thrl7Met). The NMR-based native populations 
of LLM and Betanova are 36% and 9%, respectively, at 283 K [21]. Results in good 
agreement with these estimates were obtained when testing an earlier version of our 
model on these two peptides [12]. Folding simulations of Betanova have also been 
performed by other groups, using coarse-grained [53] and atomic [54, 55] models. 

The folded structure of Betanova and LLM contains eight backbone hydrogen bonds, 
four in each of the two /3-hairpins. Fig. 8a shows the probability distribution of 
the number of native backbone hydrogen bonds, N^, in our model for LLM and 
Betanova, at 287 K. The distributions have three peaks. In addition to the folded 
and unfolded peaks at high and low N^, there is also a peak at N^ 1 = 4. Visual 
inspection of snapshots from the simulations reveals that conformations at this peak 
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Figure 9: (a) The hydrophobicity energy against temperature for Betanova (o) 
and LLM (•). The lines are fits to Eq. El (T m = 314 ± 1, AE = 8.9 ± O.lkcal/mol 
for Betanova; T m = 302 ± 1 K, AE = 10.9 ± 0.2kcal/mol for LLM). (b) Free energy 
F(A, E) for Betanova at 275 K. Contour levels are as in Fig. O). 



tend to contain the first (N-terminal) /5-hairpin but not the second (C-terminal) one. 
This conclusion, which is in agreement with experimental data [21], is confirmed by 
the frequencies of occurrence of the individual hydrogen bonds, shown in Fig. 8b. We 
see that the hydrogen bonds of the first /9-hairpin (1-4) occur more frequently than 
those of the second /3-hairpin (5-8), especially for Betanova. For a conformation to 
be counted as folded, we require that > 6. With this definition, we find that 
Betanova and LLM are 6 ± 1 % and 38 ± 2 % folded, respectively, at 287 K, which is 
in good agreement with the experimental results (9% and 36% at 283 K). 

The melting behavior has, as far as we know, not been studied experimentally for 
Betanova or LLM. In Fig. 9a we show melting curves for these peptides in our model. 
As in the /?-hairpin case, we consider the hydrophobicity energy E^ p . Betanova has 
fewer hydrophobic residues than LLM, and we see that E^ p is much lower in absolute 
value for Betanova than for LLM. In our model, the difference in hydrophobicity is the 
main reason why LLM is more stable than Betanova. A two-state analysis of our E^p 
data gives T m = 314±1 and AE = 8.9±0.1 kcal/mol for Betanova, and T m = 302±1 K 
and AE = 10.9 ± 0.2 kcal/mol for LLM. These fitted two-state parameters contrast 
sharply with the results of the analysis above, especially for Betanova. In fact, 
for Betanova, the fitted two-state parameters correspond to a native population of 
80% at the temperature 287 K, at which Betanova was estimated above to be only 
6 % folded. This discrepancy between the native populations obtained using E hp 
and data clearly show that, in our model, these two peptides do not behave as 
ideal two-state systems. It is worth noting that the quality of the two-state fits in 
Fig. 9a, nevertheless, is very good, which illustrates that deviations from the simple 
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two-state picture can be very hard to detect from the temperature dependence of a 
single quantity [56]. 

Fig. 9b shows the free energy F(A, E) for Betanova at 275 K. Like for the /3-hairpins, 
we use all the heavy atoms in the RMSD, but limit the comparison to the residues 
3-18. The residues 1, 2, 19 and 20 do not participate in the /3-sheet structure. There 
is a local minimum at A as 3.2 A representing the state obtained in our model that 
most resembles the NMR structure. That this state is not the most probable state in 
the model is consistent with the low native population found experimentally for this 
peptide. The corresponding graph for LLM shows a much more prominent minimum 
representing the native conformation. 



3.5 The character of the melting transition 



For GBlp, Betanova and LLM, we saw above that the apparent native population 
depends on which quantity we study. This dependence reflects the fact that these 
peptides do not show ideal two-state behavior in the model. A quantity for which 
we obtain a relatively high apparent melting temperature not only for these three 
peptides but for all the peptides studied, is the radius of gyration, R g . The T m 
values obtained from our R g data for F s and the Trp cage are 29 K and 9 K higher, 
respectively, than what we found above using the helix content. For GBlm3, our R g 
data gives a T m that is 6 K higher than that obtained above using the hydrophobicity 
energy. These comparisons show that none of the peptides studied behaves as a 
perfect two-state system in our model, although the deviations from this behavior 
might be relatively small for some of them, such as GBlm3. 

One measure of the sharpness of the melting transition is the height of the peak 
in the specific heat, C v . In Fig. 10, we show specific heat curves for the different 
peptides studied. The results for GBlm2 are again very similar to those for GBlm3 
and therefore omitted. The specific heat exhibits a clear peak for all the peptides 
studied, but the height of the peak varies. The peak is highest for GBlm3, indicating 
that the melting transition is most two-state-like for this peptide. A comparison of the 
energy distributions of the different peptides (not shown) supports this conclusion. 
For GBlm3, we find that the energy distribution has a bimodal shape, although not 
very pronounced. The other peptides all have wide but single-peaked distributions. 
The distribution is particularly wide, virtually flat, for GBlp, which has the next 
highest peak in C v . 
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Figure 10: The specific heat C v against temperature for the different peptides, as 
obtained using histogram reweighting techniques [57]. For each peptide, a band is 
shown. The band is centered around the expected value and shows statistical la 



errors. C v is defined as C v = N~ 1 d(E}/dT = (NkT 2 )- l ((E 2 ) - (E) 2 ), where N is 



the number of amino acids and (O) denotes a Boltzmann average of variable O. 

For the peptide with the sharpest transition, GBlm3, we find that the specific heat 
maximum, 316 K, is located near the temperature at which its folded population is 
50 %. The other peptides are less than 50 % folded at their specific heat maxima, 
especially Betanova. Betanova was estimated above to be 6% folded at 287 K in the 
model, and has its specific heat maximum at a temperature higher than that, 293 K. 



We have developed an atomic model with a simplified phenomenological potential for 
folding studies of polypeptide chains, which was tested on a set of peptides with about 
20 amino acids each, namely the Trp cage, F s , GBlp, GBlm2, GBlm3, Betanova 
and LLM. First of all, our study shows that the model folds these different sequences 
to structures similar to their experimental structures, for one and the same choice of 
model parameters. In addition, we investigated the stability and melting behavior 
of the peptides. The following list is a brief summary of these calculations, focusing 
on the observables expected to be correlated with the corresponding experimental 
probes. 

• The helix content of the Trp cage shows a temperature dependence that is in 
good agreement with experimental data based on CD and NMR (see Fig. 3). 



4 Conclusion 



19 



Exp. Model 

GBlp -30% (298 K) 27 ± 2 % (299 K) 
GBlm2 74 ± 5 % (298 K) 84 ± 1 % (299 K) 
GBlm3 86 ± 3 % (298 K) 82 ± 1 % (299 K) 
Betanova 9 % (283 K) 6 ± 1 % (287 K) 

LLM 36% (283 K) 38 ±2% (287 K) 

Table 2: Folded populations of the different /3-sheet peptides in the model, along 
with experimental results. The experimental data on GBfp, GBf m2 and GBlm3 are 
from Fesinmeyer et al. [19], whereas those on Betanova and LLM are from Lopez de 
la Paz et al. [21]. 



• A two-state analysis of the helix content of F s gives T m and AE values that are 
in good agreement with CD data, while the T m value is somewhat low compared 
to its IR-based value. 

• Estimates of folded populations based on native hydrogen bond data for the 
/3-sheet peptides GBlp, GBlm2, GBlm3, Betanova and LLM are in good agree- 
ment with CD- and NMR-based experimental results, as is summarized in Ta- 
ble 2. Recall that the energy scale was set using the a-helical Trp cage. 

• Experimentally, GBlp has been studied using Trp fluorescence as well, which 
gave a folded population higher than that in Table 2. Our results based on 
hydrophobicity energy data are in good agreement with those from the Trp 
fluorescence study. 

The model fails to reproduce the difference in folded population between the two 
stable mutants of GBlp (see Table 2), which in part may be due to the fact that 
Coulomb interactions between side-chain charges are ignored; GBlm3 contains some 
charged residues that are missing in GBlm2. The overall quantitative agreement with 
experimental data is, nevertheless, excellent. This agreement indicates that factors 
such as Coulomb interactions between charged residues play a quite limited role in 
the folding thermodynamics of these peptides, compared to hydrogen bonding and 
hydrophobic attraction, which are the main driving forces of the model. 

The temperature dependence of the model is, to us, surprisingly good, for two reasons. 
First, the temperature dependence was not considered at all when calibrating the 
model, except in the determination of the energy scale. A considerable amount of 
fine-tuning was required in order to obtain proper folded structures, but no further 
fine-tuning was performed once that goal had been achieved. Second, our calculations 
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do not involve any reparametrization of the energy function. In other words, the 
parameters of the energy function are temperature independent, which is a simplifying 
assumption rather than a controlled approximation. On the other hand, it should be 
noted that the melting transition is not triggered by a sudden change in, for example, 
the strength of the hydrophobic attraction. 

In the development of this model, we have taken a purely phenomenological approach. 
The model will be further developed by studying new amino acid sequences, which 
will impose new conditions on the interaction potential. As before, the challenge will 
be to do this in a backwards compatible manner; the model must not lose its ability 
to fold previously studied sequences. As to limitations of the current version of the 
model, we know that it is unable to properly fold the so-called trpzip /9-hairpins [58] , 
which make /3-hairpins in the model but with the wrong topology. We also expect 
that refinement of the model will be needed as the chains get larger. For example, 
as mentioned earlier, it is likely that our pair-wise additive hydrophobicity potential 
will have to be supplemented with multibody terms for large chains. Finding out how 
to change the model in order to make it more general without losing computational 
efficiency will not be an easy task, but the results obtained so far makes it tempting 
to try. 
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